supermarket_sales.csv
This dataset is found on kaggle.com. Kaggle is a platform included vast collection of high-quality datasets spanning various domains. These datasets are for data enthusiasts worldwide to study and practice.
This data set includes the sales data of a supermarket in Yangon, Naypyitaw and Mandalay in Myanmar in the past three months. Through this data set, we can analyze the customer behavior of the supermarket and gain insights from it, so as to improve the sales data and enhance the competitiveness of the supermarket.
17,017observations
| Name | Description | Data Type |
|---|---|---|
| Invoice id | Computer generated sales slip invoice identification number | Character |
| Branch | Branch of supercenter( identified by A, B and C) | Character |
| City | Location of supercenters | Character |
| Customer type | Type of customers, recorded by Members for customers using member card and Normal for without member card | Character |
| Product line | General item categorization groups - Electronic accessories, Fashion accessories, Food and beverages, Health and beauty, Home and lifestyle, Sports and travel | Character |
| Gender | Gender type of customer | Character |
| Unit price | Price of each product in $ | Numeric |
| Quantity | Number of products purchased by customer | Numeric |
| Tax: 5% | Tax fee for customer buying | Numeric |
| Total | Total price including tax | Numeric |
| Date | Date of purchase | Date |
| Time | Purchase time | Datetime |
| Payment | Payment used by customer for purchase (3 methods are available – Cash, Credit card and Ewallet) | Character |
| COGS | Cost of goods sold | Numeric |
| Gross margin percentage | Gross margin percentage | Numeric |
| Gross income | Gross income | Numeric |
| Rating | Customer stratification rating on their overall shopping experience (On a scale of 1 to 10) | Numeric |
By analyzing this dataset, I want to explore the impact of different factors on customer consumption and understand customers’ consumption habits, so as to guide the supermarket’s operational strategies and increase sales.
For example, here are the problems I plan to analyze:
Sales questions:
Among the three branches, which product line has the highest sales and profits? ( Knowing which product line generates the highest sales and profits, we can increase the purchase quantity of this part)
In the last three months, the sales trend of the three branches? ( Understand the sales trend of the three branches, and then take relevant measures to adjust after further analysis of the reasons)
In the last three months, the trend of member sales and non-member sales in the three branches? ( Understand the direct sales gap between members and non-members, and adopt crm strategy to improve member conversion)
Customer experience questions:
Which product line has the highest rating among the three stores? ( Understand which product line causes the lower rating, so as to improve the product line)
In the last three months, the rating trend of the three branches? ( Understand the customer rating trend of the three branches, and then take relevant measures to adjust after further analysis of the reasons)
Which time period has the most traffic? ( Through the analysis of the period of passenger traffic, it is possible to increase the staff in the peak period and improve the customer experience)
In the last three months, the members of the three branches gave a score below 7 points? ( Understand the situation of the members who gave a low score, timely follow up to understand the situation, and take measures to improve the retention of members)
In the last three months, the non-members of the three branches gave scores higher than 7 points? ( For non-members with scores higher than 7 points, it can be understood that they are satisfied with the supermarket, so corresponding strategies can be adopted to convert them into members)
Step1: Read the file and understand the data structure
Step2: Data cleansing - Delete empty data
Step3: Data cleansing - Converts data types
Step4: Data cleaning - Remove abnormal data
Step5: Remove useless columns ( Branch, Gender, Tax 5%, Payment, cogs, gross margin percentage)
Step6: Change column name ( Customer.type, Unit.price, Product.line, gross.income)
Step7: Add a new column “Month”, convenient for drawing plot.
You will find below the final result of the database.
gp <- supermarket %>%
group_by(City,Product_line) %>%
summarise(total_sales = sum(Total)) %>%
ggplot()+
geom_col(mapping = aes(x=Product_line, y=total_sales ,fill= City),color="black",size=0.5,position = "dodge" )+
theme_classic()+
scale_fill_brewer(palette = "Set3")+
theme(axis.text.x = element_text(angle = 10, vjust = 0.8))+
labs(x = "Product Line", y = "Total Sales", title = "Sales of Product Lines")
ggplotly(gp)
First insight: Naypyitaw store has the highest sales in the “Food and beverages” line, Yangon store has the highest sales in the “Home and lifestyle” line, while Mandalay has the highest sales in the “Sports and travel” line, Therefore, each store can increase the quantity of such products.
gp <- supermarket %>%
group_by(Month,City) %>%
summarise(total_sales = sum(Total)) %>%
ggplot()+
geom_line(mapping = aes(x=Month, y=total_sales ,color= City),size=1,position = "dodge" )+
theme_classic()+
scale_color_brewer(palette = "Set2")+
theme(axis.text.x = element_text(angle = 10, vjust = 0.8))+
scale_x_continuous(breaks = c(1, 2, 3))+
labs(x = "Month", y = "Total Sales", title = "Sales trend of each branch")+
geom_point(mapping = aes(x=Month, y=total_sales ),size=1,fill= "black" )
ggplotly(gp)
First insight: The sales volume of the three branches all fell sharply in February, and then rebounded in March, among which Yangon dropped the most. We can carefully analyze whether the supermarket made a big move in February, or whether there was a big change in the external market environment, which led to the sharp decline in sales volume.
gp <- supermarket %>%
group_by(Month,Customer_type,City) %>%
summarise(total_sales = sum(Total)) %>%
ggplot()+
geom_line(mapping = aes(x=Month, y=total_sales,color= Customer_type,linetype=City),size=1,position = "dodge" )+
theme_classic()+
scale_color_brewer(palette = "Set2")+
theme(axis.text.x = element_text(angle = 10, vjust = 0.8))+
scale_x_continuous(breaks = c(1, 2, 3))+
labs(x = "Month", y = "Total Sales", title = "Sales trend of each branch with different membership")
ggplotly(gp)
First insight: Members of Yangon Store have the best quality, and their sales have always been higher than Normal customers. However, in March, the quality of Members declined, and the sales volume of Normal customers was higher than that of Members. Therefore, marketing activities can be considered to convert these high-quality customers into Members.
gp <- supermarket %>%
group_by(City,Customer_type) %>%
summarise(mean_rating = mean(Rating)) %>%
ggplot()+
geom_col(mapping = aes(x=City, y=mean_rating ,fill= Customer_type),color="black",size=0.5,position = "dodge" )+
theme_classic()+
scale_fill_brewer(palette = "Set3")+
geom_hline(yintercept = mean(supermarket$Rating), color = "red", linetype = "dashed")+
geom_text(aes(x = 1.1, y = 7.3, label = paste("Average Rating:",mean(supermarket$Rating))))+
theme(axis.text.x = element_text(angle = 10, vjust = 0.8))+
labs(x = "Product Line", y = "Total Sales", title = "Sales of Product Lines")
ggplotly(gp)
First insight: The user ratings of Mandalay stores are all lower than the average line, so it is necessary to understand the reasons as soon as possible and take corresponding measures to prevent further customer loss. Both Naypyitaw and Yangon stores have very high Normal customer ratings, so marketing campaigns can be taken to convert these Normal customers into members.
Below is a first version of what the application could look like.
There are four main areas:
Area1: This area is Sidebar. Includes navigation menus, filters to control filters for charts and tables on the right to dynamically display different data.
Area2: This area is tabBox. I will use tabBox to control the two question directions “Sales questions” and “Customer experience questions” that I hope to study. Under each tab, there will be different tabPanels to control different charts or tables.
Area3: This area is tabPanel. This section will have different navigation content depending on the type of tabBox question above. After clicking, different graphs or tables will be displayed, and each navigation will do a topic that I want to study
Area4: This area is Body. It contains charts, tables, etc., used to show the main content of the application, the most core chart information will be shown here for our analysis, and there will be a conclusion below the chart.